D0C0113NT BESIH3 

TH 009 509 

Brandenburgr Dal^ C. 

Soms Sta'^istical Properties of Item Specificity in 
Student Ratings. 
Apr 79 

29p.; Paper presentad at the Annual Keeting of the 
National Council on M'^asurement in Education (San 
Francisco, California, April, 1979) 

MF01/PC02 Plus Postage. 

*Cl assificat ion; Content Analysis ; * Course 

Evaluation; High^^r Education; *Iteir Analysis; 

Qu e^tionnaires; Rating Scales; *Studen t Evaluation of 

Teacher Performance; *!Iest Items; Test Reliability; 

Test Validity 

♦Item Specificity 



Prior research has indicated that items administered 
to college students for rating their irstructors, can be empirically 
as well as logically classifi'^d on a -continuum from very general to 
specific. Three of -"-hese hypothesized classifications of item 
specif i::ity--global , general concept, and specif ic--were chosen to 
represent ^.his continuum. Ihirty^nine likert-type items empirically 
identified as members of these categories, and as m€n!bers of a 
content domain labeled influence and security, were then compared 
against six statistical properties: (1) skewness in distribution of 
class section means; (2) between-class variance of means; (3) 
within-class variance among student responses; (U) ceiling effect; 
(5) item reliability; and (6) interquartile range. Results indicated 
that most items aiet the criteria hypothesized, although sc e 
discrepancies for the most specific items were pronounced- The 
differentiation among specificity levels offered an essentially 
content-free clas?;if icat ion scheme. I irplica tion s were drawn for 
questionnaire item writing, use of results, and the evaluation of 
overall item quality. (Author/CP) 



7D 171 802 

A OTHOR 
TITLE 

PUB DATE 
NOTE 

EDES PRICE 
DESCRIPTORS 

IDENTIFIERS 
ABSTRACT 



^f* ********************** If ****:pc:^i^i** 

* Reproductions supplie'5 by EDRS are the best that can be made * 

* from the original document. * 

********************* **:ii*** *********************** it******4 ************* 

O 

ERIC 



u s OEPARTMEW T OP HEALTH. 
eOUCATIOMX WELFARE 
NATIONAL 1*4 : riTUTE OP 
EOUCAT.-On 



THIS DOCUMP^. • 
DUCED EXACT . . 
THE PERSON Ot f • 
AriNG IT POINT*. 
STATED DO NOT • 

SEnToff;c»a>. NA" 



"PERMISSIOt\J TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCAnONAL RESOURCES 
INFORMATION CENTER (ERIC) AND 
USERS OF THE ERIC SYSTEM " 



Some Statistical Properties of Item Specificity 
In Student Ratir^s* 
Dale C. 3randensKtrg 
University of IlliiiGas at Usaana^Champalgn 



• SEEr- REPRO< 
i rEiv i.. PROM 
■ -ATtc -•O^^IGIN- 
- 5'.iviONS 
'(ti{' tfPPRE- 

EDUCATION PcsiT OK 



o 

CO 

r-H 

o 



Prior research has Indicated that student rating items can be empirically as v ^11 
as logically classified on a continuum firom very general to specific. Three oi 
these hypothesized classifications of tnem spec^^ficlty (Global, General Conciyt 
and Specific) were chosen to represent tnls co»t±nEum* Items empirically lostitl^^iss: 
as members of these categories were studjaed tr dtamovex further descriptive 
statistical behavior of Item results and ta^^ fe^#l agalnn' :he selected izsm 
Results Indicated that most items met the ci-^lt^ ia ^^wi£hesi i although scs^^ ^ 
crepancles for the most specific items were provtova^eci. ti^^-* 'nations were dfiCt^t 
for questionnaire item writing, use of results ^*nr? ^a»alu^^ iWi) of overall it<« 
quality. 



© 

a 
© 
© 



er|c 



*Paper presented at 1979 NCME Annual Meeting, San %atnci^co, California, April, 15^. 



Some Statistical Properties of Item SpeeJLflelty 
In Student Ratings 

Dale C* Brandenburg 
University of IlllwPifH at Urbana^Choimpai^ 

The tjcxature on students' :::atliigs of college InstrurtsBors and coucss Is 
abundant ' (cht correlaticnal studies on inwF different variables Influence rrat±i^ 
in amtLt: 7 co Investigations of inter*rK2itlonshlps among the ratings rt3»?iB«lwes. 
Lirrlf at ctaaion, on the other hand, has been given to descriptive propcfr:!fcASs erf 
Items aittF afascales except for data Included ±111 technical reports of quesclomsires 
(and tlses* reports are not widely dlsrrltauted) • Furthermore, recent work by 
BrandBBi»rn; Derry & Hengstler Q«78) and Srey (1978), has Indicated that one-nmy^ 
^^^Kve oyaiffwfTi^ more closely the ^eclftclty and general dimensions of Items xn 
crrder tc tmu conclusions from gen^eral corrslaclonal studies. 

■ important outcome of the KvandHwiburg, et al. (1978) study was that rating 
rtems could btt^ empirically classlr^^ asraording to specificity* This dasslf Icsacirm 
by speciflcricy for the most part vms hypcrthes±2e£l by Smock and Crooks (1973) and 
by Rosenshlne (1970). Specif Icirv refes^ to the general to specific terminology 
used to wom the items or alternatrrvely^ the amount of Inference or jtxdgment 
requited on the part of students to resjimsd to a given item. More generally statantf 
Items require higher Inference or more ^fUigment ; more specifically or behaviorallr 
stated items require less Inference or len Judgment* Labels for these items from 
most general to specific were given as Global, General Concept and Specific in the 
Brandenburg, et. al. (1978) study* 

These levels of item specificity m»gr also be related to how results from the 
different item types may be used. More seneral items provide Information appropriately 
used in administrative (tenure, merit, ttsc. ) decisions; specific items provide 

EKLC 



isifonnBtlan more appropriately used in glv^tjxig feedback related to instructional 
iinprovement. Because geiceral item reanlts Jore ssed compatratively , noirma are useful 
if not mandatory in order to derive leas oafelguous interpretations. On the other 
hand, specific item results are not usutallf \u8^ compaxErrvely ao norms are not 
necssssary, or in some cases would even be inssleadlng. 

This concept of specificity has been nde potentially more complex with the 
recat work of Brandenburg, et. al. (1378) and Frey (1978) • These studies make the 
dlajm far two higher-order domains of items in which Frey argues that the item 
memb^fers of one domain may be potentially more influenced by biasing effects such as 
class «lze, required/elective nature of course, and course level than are the item 
meml«r9 of the second doBMin \?rey, 1978). Additionally, items in the second domain 
appear zo correlate mare y with tiae student achievement and faculty publication 

recas3i& Frey calls the frrat domain "^Support" and the second "Pedagogical Skill"; 
In Braaiienburg, et, al. (ISTTS) the names were "Security" and "Influence," respectively. 
While Tlie names of these tB» domains differ between Che two studies and the items are 
slighcay different, the conarrructs remain essentially the same. More Importantly, 
the Inplicffirions advocated f ::r other studies is that conclusions such as "it was 
found that z±e student ratiasKS differed significantly by course level" without having 
examined the higher-order domain to which such item(s) belong is probably improper 
and likely lo be invalid far all items. 

In the Brandenburg, ec:. al. (1978) study, it was shown that item specificity 
categories could be identlMed within each of these higher-order domains. Research 
using similar well-defined specific vs. more general Items is sparse. Pohlman (1975) 
used primarily General Concept items to predict results on a Global item, and the 
outcome was generally positive. Cushman and Tom (1976) developed a set of "specific" 
teaching behaviors and correlated them to student perceptions of their progress on 
certain course objectives. While their resulting quefctionnaire contained a number of 
good psychometric qualities, its overall utility was limited due to the specific nature 



of the questions. Rosenshlne and Stevens (Note 1) correlated results on subgroups 
of specific Items to more general questions related to each subgroup. Specific 
Items that had low correlations with more general Items were considered Irrelevant 
for future considerations In student ratings. All of these studies were correladonal 
and none attempted to define further descriptive characteristics of the general or 
more specific Items. 

If one assumes that the concept of Item specificity has some practical utility 
there should exist some descriptive statistical properties of Item types which may 
permit further Item differentiation. Conversely, zhe assumptions regarding Item 
specificity Imply that certain statistical properties of Items must exist for them 
to be classified In a given manner. For example, general Items used for comparative 
purposes must have sufficient between-class-section variance in order to differentiate 
among Instructors or classes. Specific items, on the other hand, might be hypothesized 
to show relatively small within-class variance, implying that students are responding 
to the same observed behavior, activity or trait. The objective of this investiga- 
tion was to hypothesize some statistical properties related to item specificity and 
test their adequacy on some recently collected student rating data. A secondary 
purpose was to Illustrate some descriptive characteristics of "good" Items within a 
given classification. Such an examination should assist future student rating 
questionnaire developers in Improving overall item construction and utilization 
procedures. 

Definition of Terminology 

Items selected for testing statistical properties were chosen from the Instructor 
and Course Evaluation System (ICES, Brandenburg, Note 2) Item Catalog. ICES is a 
flexible, computer-based CAFETERIA-type mechanism for collecting student ratings of 
Instructors developed and used at the University of Illinois at Urbana-Champaign. 
The general parameters of this system permit instructors to select up to 23 items 
from the catalog which are computer-printed on a scannable answer sheet. 



The ICES Iteo Cat^ mcsins about 450 Items which are classii by cont^ r 

and item specificity, tie Brandenburg, et al. (1978) stxciy it was found cb**^ 

items could be groune* Into r^or. lar^ domains or halo dimensic7?n5 labelei Lnfiaea' 
and Security. In tesM ot '/rem. content. Influence consists pr'::iaBrily c*f irer.iis 
related to Srudent-5-?rr:eiwed iterromes, Instructor Communication 5ld— Is, ami 
Imstruc tor /Hoarse Stl*u:^ Atl:or ^ir Mct±vation. Security item conrsnt coriSf^ Jg- rr-ia&rily 
or items relsced to ' ^^^se Mam^ement/ Structure and Instructor nnth/Cc3c<*rr^ 
Caurse Difficulty or cufclcad negatively related to Security posit; ei- 
related to Influence 

The Interaction of^' Tafluence and Security domains with irem sp-j^^'Miic" zi 
depicted in Figure I Ti^^e lar^e ellipse in the figure represents :he csDt= urrrsrtfT^se 



l^nsert Figure 1 About Here 



of student rating l^ems Jhtt two smaller ellipses within it represent t^hf i.tem 

domains of Influenc 'Security which are further composed of Stigcific. Genera 

Concept and Global Th^se labels for specificity represent the iic^st specific 

to the most general rt^irfpectively. Unique items are those vbich 4o :iot re^.atE 

in any systematic -ftwr eiJtdier content categories or general -OTalr..*, Examples 
follow: 

Global 

tfl64 — Do you ourse objectives were accomplished? 

Yes, No, not 

greac ^ >ent at all 

Genearal Conce^^ 

#3 — The course '^as: 

Organized j Disorganized 

Specific 

#265 — The instructor made use of alternative explanations irhezt needed. 
Almost always Almost never 



ERLC 



Unique 

i^Z28 — I went to sleep in <:lass: 

Very often ver 
t2^al Itens are characterized as being eouat or nearly equal member^ zf both 
TIrfl-^«?!ce and Security domains and thair rnr jjer is probably finite af small. 
.UCTeraji Too^ept Items are seen as indicator items — results of^fnich^a*' illus- 
rrat« neiral strengths or weaknesstas of iGsntified constructs 3v h as speaking 
afcUlf^ cnurse organi2«-tion or sensitivity to students. Co» arese items 
wiOa .oaii: highly on one factor in a factor analytic study de^ ag a trait or 
cnrfvc -uct in student ratings, whereas global items would typic-^Ly ad on two or 
mr. -^e three factors. As opposed to Global nnd General Conce- ±t^^ , the number 
of Specific items is potentially infinite. Specific items, r owever should be 
i^ated to a General Concept item within tht. same construct fact r; otherwise 
Cneir utility is limited. This serves, th^;*., to dif ferentiat^^ Spetiific from 
Unique items in that Unique items would not^ be related to an denri^iable 
General Concept items (or factors). 
Procedure 

Data for the investigation was obtained from summary reaults for each class 
section utilizing ICES questionnaires during the 1977-78 academic year. Overall 
ixsage was about 5,000 class sections, out only 2,200 actually used the item 
catalog (instructors could also opt for pre-designed questionnaires)* 

All items used in this investigation contained five response positions with 
Qtxly end points labeled. Two item examples are: 

#101 — The grading procedures for the course were: 
Very f/4ir Very unfair 

#241 — Was the inHitru'^tor a good speaker? 

Yes, very good No, rather poor 

Items chosen for this investigation were previously empirically identified in 

the Brandenburg et al. , (1978), study as belonging to certain specificity cate- 

7 



6. 

gorles and hi ^ B wr -OTder dotoalns 'Influence-and Security). A list of the selected 
Items within th%i: ci^^assxfied dinoraions is given in Table 1, and it include 8 Global, 
14 General Concesr and 17 Specific items. 

Insert Table 1 About Here 

A second e3S53 of the procedure was to generate hypotheses concerning statistical 
properties whixx aay be dif f erasniable according to the previously delineated item 
types. Two source of Inf ormar^Tn were used to generate hypotheses—the theoretical 
distinction anBon^ Item types re-.3ted to item construction and purpose, and the 
experience of th^ author. Many of the hypotheses relate to an evaluation of item 
quality. For example, Global and General Concept items should have sufficient 
reliability to discriminate amonig instructors, but item reliability in this sense is 
irrelevant to Specific items. 

The following loosely structured hypotheses were generated: 
1^ Skewness in distribution of class section means: Means for Global and 
General Concept items ^should be negatively skewed — Globals more so than 
General Concept, Specific item results are likely to be flatter, but also 
negatively skewed. 

Rationale : Negative skewness is a rather apparent characteristic of rating 
data for both within-class and between-r.class response distributions • With 
very few exceptions most questionnaires F^^ese^tly used contain almost 
exclusively Global and General Concept items, thus negative skewness is 
very commonly observed. While Specific items mfl.y also be negatively skewed, 
general halo effects should be less pronounced. 
Measures : - Mdn (from overall item mean distributions) 




EKLC 



• Between-class variance of means: Variance should be larger for GlotbalL rian 
for General Concept. Specific Items may be expected to have slight ^ 
smaller variance than general Concept items, although some Specific: 1-=.^ 
variance asy be exceptionally large. 

Rationale : I?if f erentlation among classes or instructors is a primary iiitent 

of Global and General Concept items. The general wording of these items 

should pearmlt students to respond to whether or not a given course or 

instructor has more or less of each trait rated. Comparisons amonfc courses 

or instructors is an inevitable outcome accounting for substantial variance. 

Specific items, on the other hand, are theoretically only applicable in 

those claisses where they were selected. This, itself, should accccarEt for 

a decrease in between-class variance. If Specific item variance is 

exceptionally large, it can probably be accounted for by a few extreme 

classes cr instructors. 
2 

Measure: s^ (between-class variance of means) 

Within-class variance among student responses: Specific item variance might 
be 3. 9 ier than either Global or General Concept variance. 
Rationale : Specific items are meant to be behaviorally related to specific 
occurences, course materials, and instructor quirks observ^ed or experienced 
in the classroom. If all students see the sam* thing and react the same 
way, within class variance should be small. For Global and General Concept 
items, the expectation for common student perception is unlikely. 
Measure : s^ (average within-class variance of responses for an item) 
Ceiling effect: The negative skewness of Global and General Concept items 
produces little discrimination at the top of the distribution. Specific 
items should produce a lesser degree of this ceiling effect. 



8. 

Rationale : To a certain extent the expectation for lower ceilings on 
Specific Items Is built on the premise that Instructors will choose Items 
corresponding to their weaknesses as well as to their strengths. To the 
extent that this Is true, the hypothesis should be confirmed. 
Measures : (Maximum - X)^ Maximum weight minus mean for distribution as a 
whole. 

(Maximum - ^gQ^-^ Maximum weight minus C^^ for distribution as a 
whole. 

Item Reliability: Reliability should be largest for Globals (mid 80's), 

followed by General Concept (70' s to mid 80's) and Specific (70' s or lower). 

Rationale : Item reliability as measured here is essentially an index of 

discriminating power, and it contains the ratio of within-class variance to 

between-class variance. Since between-class variance for Global and General 

Concept items is expected to be comparatively larger than that for Specific 

2 

items, item reliability should be greater. This larger value of s-, more 

B 

2 

than makes up for the cmaller s for Specific items. 

w ^ 

Measure : Horst Reliability Formula 

N 

rel = 1 - Z " 



1 w 



N 



2 

2 2 

where s and s_. defined as before 

W D 

n = number of students within a class section 
w 

and N number of class sections 
Interquartile Range: Due to larger discriminating power of Global and 
General Concept items, the interquartile ranges of these items should be 
close to that of a normal distribution. The interquartile ranges of Specific 
items should be highly variable. 

lO 



9. 

Rationale ; This hypothesis assumes that Global and General Concept item 
mean distributions are approximately truncated normal curves. It is 
difficult to hypothesize a theoretical family of curves that Specific items 
would generally follow. In fact, they may not follow any family. 
Measure : Q^^ = C^^ - C^^ 

Results 

Results of all descriptive indices or measures are displayed in Table 2 for 
each item. General Concept and Specific item results are presented separately for 
Influence and Security domains. In addition to the measures described above, it was 

Insert Table 2 About Here 

necessary for summary purposes to add further indices. The rows labeled s • ¥ , 

W 

2 •—2 

and Sg/s^^ are self-explanatory. Measure ND/I refers to the ratio of the number of 
standard deviations in the interquartile range for a normal distribution (1.349) 
to the number of standard deviations (s^) in the interquartile range of the mean 
distribution for a given item. Numbers greater than 1.0 for ND/I would thus indicate 
more spread in a normal distribution for the middle 50% of class means than for the 
item under study. The last row labeled "N" refers to the number of class sections in 
which each item was selected. It should also be noted that in the results for 
Influence-Specific items 19 and 144 and Security-Specific items 24, 52, and 12z, the 
mid-point was the most positive response. Thus measures s , s , and Q. are probably 

WD X 

deflated. ''Max" was chosen to be 3.0 so (Max - X)_, has a different meaning from 

B 

other items and (Max - C^^) has no meaning. 

Table 3 was constructed to assist in summarizing the data in Table 2. Table 3 
contains the medians of selected indices from Table 2 data. Results are presented 
according to hypothesis. 

Insert Table 3 About Here 



11 



10, 

1^ Skewness in Distributions of Class Section Means 

The first measure of skevmess (X_ - Mdn ) yielded only minor differences among 

B B 

item types. Most measures were around -.10 or less* Results for the second measure 



^90 " ^50 



- C^Q - Cj^Q did Indicate that Global items were slightly more negatively 



skewed than General Concept items (see Table 3). Specific items, on the other hand, 

fluctuated from slight positive skewness to very negative. This result may in part 

be due to the smaller number of class sections using Specific items. 

2 

2. Between Class Variance of Means (s^ ) 

2 

Except for item #170 there was only slight variability of s_ for Global items. 

B 

2 

The average s„ for Influence-General Concept items was higher than for any other item 
B 

2 

grouping (see Table 3). Specific item s_ varied considerably among items (range .121 

B 

to .828 for non-mid-point items), and this unpredictability was expected. If the 

items selected are representative of "good" Global and General Concept items an s of 

B 

.50 appears to indicate a criterion of quality. 

3> Within-Clg^ss Variance Among Student Responses v. .^^ • -^JX 

— 2 

The average within-class variability (s^ ) was found to t^.: l^^er for Global 

items than any other item group. Thij=^ was contrary to expectations. Influence- 

2 

General Concept items yielded the highest average s^ and this was consistent with 

expectations. It can also be noted from Table 2 that with the exception of item #200> 

2 

all Global and General Concept items s were between .67 and .87 — a high degree of 

w 

— 2 — 2 

consistency. The finding that s for Specific items was little different from s 

w w 

for other items suggests a rethinking for expectations of Specific items and their 

associated utility. At least the mid-point, best response Specific items yielded 

— 2 

the lowest values of s^ as might be expected. 
A. Celling Effect 

The influence of greater skewness of Global items also probably accounts for the 
greater ceiling effect observed for the measures (Max - X) and (Max - C^q) . For 
Global items the room at the top (Max - X) of the class means distributions are 



o 12 
ERIC 



EKLC 



11. 

generally less than two standard deviations (s.,) while for all other item groups the 

D 

distance from the maximum to the inean is at least two s^. It is also worthwhile to 
note that Influence-General Concept item means were substantially lower (about 
1/2 Sg) than means from all other item groups. 

5. Item Reliabilities 

In general there were no differences between Global and General Concept items for 
reliability. Except for item //170 the range was .69 to .91, and a median of .80 was 
observed for each subgroup. A value of .80 may be a useful criterion for items of 
these types because discrimination power is needed. The reliability for Specific 
items on the other hand, fluctuated a great deal and conformed to the hypothesized 
low .70's. This may be due in part to the smaller sample sizes on which Specific 
item data was based. 

6. Interquartile Range 

The results for the interquartile range (Q^ conform to those given for ceiling 
effect. That is, Influence-General Concept items had a substantially larger Qj 
than did other item groups (see Table 3). In order to get a better idea of how 
compares to that expected in a normal curve, the measure ND/I was determined. The 
values as given in Table 2 show that Global and General Concept items provide about 
as much discrimination in the center of the means distributions as a normal curve. 
The range of ND/I with one exception (item //200) was .92 to 1.1. One-half of the 
General Concept items had values less than 1.0 indicating slightly more spread than 
that for a normal distribution. As may be anticipated, the spread of ND/I values 
for Specific items is much larger than that for other item groups, and the number of 
values below 1.0 (7) was exceeded by the number above 1.0 (10) • 

Discussion 

The differentiation among specificity levels of student rating items offers an 
essentially content-free classification scheme. It also has some potential long run 



13 



12. 

benefits In the evaluation and Improvement of college instruction. In order for 

these benefits to accrue, item specificity implications have to be thoroughly 

investigated. This study represents an attempt to articulate some descriptive 

statistical indicp.s related to item specificity implications. 

To a large extent, the path for investigating these implications directly 

interacts with the investigation of item quality. To illustrate, for Specific items 

to have maximum utility in instructional feedback situations, one might assume that 

within-class variance be small to assure that faculty development efforts are pointed 

in the proper direcCion. Along the same line. Specific items should be worded so that 

growth may be observed over time — high ceilings on item means would prohibit this 

observation. Thus, if Specific items do not yield certain statistical properties 

related to their theoretical purpose, then more work has to be done to improve the 

items or they should be eliminated from use in the classroom. 

Interpretation of the above results for Specific items in this context leads one 

to question the overall quality of such items selected for this study or to rethink 

the hypotheses regarding the statistical behavior of these items. More specifically, 
- 2 . . . , . ^2 

»^ was nxgner tnan ancicipacea i^about equal to s^ tor other item groups) and the 
ceiling effect was just as prevalent if not more so for Security-Specific items. The 
ceiling problem may be due in part to instructors selecting items they know students 
will give high ratings. But since results of such items are not routinely sent to 
department heads, this behavior does not appear to be highly rational. On the other 
hand, the statistical properties selected for use in this study may not have been 
completely fair to judge Specific item quality. Perhaps more attention to within- 
class distributions would have yielded more positive results. 

Taking another approach, it may be useful to hypothesize alternate explanations 
for the observed discrepancies. This explanation Involves six potential reasons^ 



The author is indebted to H. Richard Smock for his thoughts on this topic. 



EKLC 



14 



13. 



which may account for the observed data. (1) Global items are easier to answer; 

they contain a halo effect that Specific items do not possess. This may account for 
— 2 

the lower s obtained for Global items. (2) Responses to all items include learner 
w 

differences, cognitive differences and differences in approaches to learning. These 
differences may serve to increase variability on Specific items, not decrease it. 
(3) The semantics of the responses may be more of a factor in responding to Specific 
items than to other item types. (4) Recency effects, those occurrences or inter- 
actions observed or felt in the classroom situation impinging upon the student at 
the time he/she is responding to a question, may influence results on Specific items 
more so than on others. (5) There exist affective or emotional differences among 
students (regardless of the homogeneity of general intellect) which Specific items 
may he more likely to elicit that would also serve to increase variability. (6) 
Specific items might also trigger associations with authority and past comparitors, 
such as former teachers, parents or other acquaintances, more so than other items. 
While these six are conjectures not generally researchable, they at least provide a 
framework which future investigations may take into account. 

Parallel considerations for item quality apply to Global and General Concept 
items. Prior arguments have been made in this case for between- class variance and 
item reliability. If the items do not satisfy these criteria, they also must be 
reworded or eliminated because discrimination among instructors or classes is a* 

primary function of these items. Such criteria include an s_ ^ .5 (5 point scale), 

B ' 

Rel > .80, (Max - X) ^ 2Sg and ND/I 1 1.1. These criteria would appear to permit 
adequate differentiation among class means. It is quite probable that these criteria 
are not standards, but they do permit a starting point for judging item quality. 
Most Global and General Concept items included in this study met and surpassed these 
criteria. 

It is also worthwhile to note that Influence-General Concept items appeared to 
behave quite differently from Security-General Concept items. Notable differences 



15 



14. 



Included an Sg 40% larger for the Influence subset and a (Max - X) range that was 
about 1/2 8 larger. Pa?:t of these differences may be due to the item sample, but 
most may be accounted for by the general content domains represented in the items. 
Maybe the larger s_ and (Max - X) Indices observed are a product of the less 

B 

susceptibility to biasing effects of Influence items. This is yet another topic 
for further study. 

The experience gained in this study is transferable to the investigation of 
other items in the ICES catalog as well as to the examination of other questionnaires. 
If item purpose (use) is specified together with a judgment of general to specific 
wording used, then some examination of item quality can proceed. The methods 
presented here provide a start for such efforts which in the long run should result 
in better quality student rating data and results more likely to withstand the 
pressures for appropriate use. 



16 



Reference Notes 



Rosenshlne, Barak and Robert Stevens. Specific college teaching behaviors. 
Mimeographed. University of Illinois College of Education, undated. 
Brandenburg, Dale C. ICES: Its Rationale and Description. ICES 
Newsletter No. 2, Office of Instructional Resources, University of 
Illinois, August, 1977, 



17 



References 

Brandenburg, Dale C. , Sharon Derry and Dennis D, Hengstler. Validation of an 
Item Classification Scheme for a Student Rating Item Catalog. Paper 
presented at the NCME Annual Meeting, March, 1978. Toronto, Canada 
(ERIC IMOOJieg). 

Cushman, Harold R, and Frederick K.T. Tom. The Cornell Diagnostic Observation 

and Reporting System for Student Description of College Teaching. NACTA 

Journal, March, 1976, 10-16. 
Frey, Peter W. A two-dimensional analysis of student ratings of instruction. 

Research in Higher Education , 1978, 9^, 69-91. 
Pohlmann, John T. A description of teaching effectiveness as mciasured by student 

ratings. Journal of Educational Measurement . 1975, 12, 49-54. 
Rosenshlne, Barak. Enthusiastic teaching: A research review. School Review , 

1970, 78, 499-514. 

Smock, H. Richard and Terence J, Crtok. A plan for the comprehensive evaluation 
of co-lege teachers. Journal of Higher Education, 1973, 44, 577-586. 



18 



Figure 1 



Hierarchical Schematic Classification 
of Student Rating Items 




Influence 



Security 



19 



18. 



Table 1 



Selected Items and Their Hierarchical ClaHslflcaxJm 



GLOBAL 



2 — The Instructor stated clearly what was 
expected of students. 

Almost Almost 
always never 

5 — Was the progression of the course logi- 
cal and coherent from beginning to end? 
ies, wo, 
always seldom 

13 — Was class time spent on unimportant 
and Irrelevant material? 

Yes, often No, never 

16A — Do you feel course objfitrtlves were 
accomplished? 

Yes, to a No, not at 

great extent all 

169 — Did this course Improve your understanding 
of concepts and principles in this field? 

Yes, Big- No, 

nlf Icantly not much 

170 — Can you now identify main points and 
central Issues in this field? 



160 — How much do you feel you have accom- 
plished in this course? 
A great Very 
deal little 

162 — How much have you learned in this 
course? 



deal 



little 



cxearly 



wexi 



195~Dld your interest in this course Increase 
or decrease as the semester progressed? 
Greatly Greatly 
Increased decreased 



200 — Were you stimulated to discuss related 
topics with friends outside of class? 
Yes, often No, never 

204 — I developed ^.''^pre fiosltlve self- 
concept becai«4e of ^Is cotirse* 
To a grea^ Not at all 
extent 

220 — Compared to cdier courses, how much 
effort did you put into this course? 
Much more Much less 

255 — How interesting were the Instructor's 
presentations? 

Very Rather 
lutecesting boring 

325 — The Instructor motivated me to do my 
best work. 

Almost Almost 
always never 



240 — The instructor was a dynamkc teacher, 
Yes, very No, very 

dynamic dull 



INFLUENCE-GENERAL CONCEPT 

46 — How would you rate Instxnctlonal 
materials used in this course? 
Exciellent Poor 



SECURITY-GENERAL CONCEPT 



3 — The course was: 
Organized 



Dis- 
organized 



4 — Was there ag^reement between announced 
course objectives and what was taught? 
Strong No 
agreement agreement 



RJC 



20 



19. 



Table 1 
(continued) 



101~The graf lag procedures for the course 
were: 

Ver Very unfair 



i05~Dld th 
definit 

Yes 
re.. 



uct'^T' have a realistic 
excellent perf orraance? 
y No, very 

unrealistic 



286~The instructor's presentation of abstract 

ideas, concepts, and theories was: 
Very Very 
clear unclear 

362— The instructor seemed to sense when 
students did not understand. 
Strongly Strongly 
agree disagree 

INFLUENCE-SPECIFIC 

19 — The course content was: 
Too Too 
advanced elementary 

50~Were readings well selected? 
Yes, all No, all 

very good very poof 

63— Describe your written assignments. 
Interesting Dull, 
stimulating uninspiring 

116 — Did the exams challenge you to do 
original thinking? 

Yes, very No, not 

challenging challenging 

144— Describe the pace of the course. 
Too fast Too slow 

328— Did the instructor raise challenging 
questions in class 7 
Yes, No, 
often seldom 

335 — Did the instructor encourage you to devel- 
op your ideas and approaches to problems? 
Definitely Definitely 
yes no 



382— Was a good balance of student participa- 
tion and instructor contribution achieved? 
Always Never 



SECURITY-SPECIFIC 

24— Should more/less time be provided to 
leview and synthesize course material? 
Much more Much less 

time time 

52 — Did the readings require a reasonable 
amount of time and effort? 
No, too No, too 

demanding simple 

114 — The exams reflectevi important points 
in the reading assignments. 
Strongly Strongly 
agree disagree 

122 — How difficult were the examinations? 
Too Too 
difficult easy 

265 — The instructor made use of alternative 
explanations when needed- 
Almost Almost 
always never 

3^iG — Sid the instructor suggest specific 
ways students could Improve? 
Yes, No, almost 

frequently never 

354 — -T!he instructor listened attentively 
to what class members had to say. 
Always Seldom 

378 — Was the instructor cynical and 
sarcastic? 

Very Not at all 

cynical cynical 

381 — In terms of direction and structure of 
the course, the instructor was: 
Flexible Rigid 



EKLC 



21 



Table 2 

Descriptive Statistics for Selected ICES Items 

„ -:; ^IM INTLDENCE-GENmL CONCEPT 

^ S 13 164 169 170 195 240 15 160 162 200 204 — 220 255 323 

If^n^ -.09 -.05 -.07 -.10 -.07 -.08 +.01 -.10 -.01 .-.07 -.05 -.10 -.09 +.(]2 -.10 -.04 

(C5rCl0)B --25 ".12 -.11 -.30 -.21 -.25 +.04 -.27 -.07 -.19 -.14 -.65 -.49 +.02 -.32 -.16 

'b '^^ '55 .38 .60 .66 .52 .56 .45 .76 .61 .58 .69 .61 
2 



•2'»3 -26' .223 .226 .302 .143 .355 .431 .266 .313 .199 .578 ;373 =335 ,474 ,372 

V .73 .74 .71 .67 .81 .82 .70 .74 .80 .85 .81 1.93 .82 .83 .84 .87 

\ .533 .548 .504 .449 .656 .672 .490 .548 .640 .723 .656 1.06 .672 .689 .706 .757 

»^r\ .49 .49 .44 .50 .46 .21 .72 .79 .42 .43 .30 .55 .56 .4!* .67 .49 

(Hax-X)jj .80 .91 1.13 .84 ^91 .97 1.64 1.04 1.19 1.07 .91 1.52 1.32 1.47 1.24 1.27 

(Hix-Cjq)j .21 .28 .54 .23 .22 , .58 .89 .24 .58 .39 .38 .73 .70 .70 .45 .46 

V25B '^2 .60 .74 .56 .81 .90 .63 .80 .58 .82 .86 .80 .92 .83 

55/1 i.O i.l 1.0 1.1 1.0 .92 LO .99 1.1 .94 1.0 1.2 .96 .98 1.0 .99 

tei .799 .809 .676 J06 .755 .483 .744 .510 .745 .786 .734 .867 .755 .812 .887 ' .834 

I 786 455 127 175 691 122 183 336 345 435 507 70 54 682 204 575 



ERIO 



23 . 



Table 2 (Continued) 
Descriptive Statistics for Selected ICES Items 

SECURITY-GENERAL CO NCEPT INFLUENCE-SPECIFIC 



Item No. 


3 


4 


101 


.105 


286 


362 


19* 


50 


63 


116 


144* 


328 


335 


382 




-.04 


-.06 


-.04 


-.05 


-.05 


-.08 


+.03 


-.07 


+.04 


-.01 


-.06 


-.10 


-.04 


-.06 


%\ a ' 

50 10 B 


-.11 


-.13 


-,16 


-.14 


-.18 


-.24 


+ u 


- 17 

ill 




TiUj 


-.JL 


^ on 


no 


1 0 

-.18 


B 


.51 


.46 


.53 


.50 


.50 


.43 






LI 
.4/ 








.'►9 


CO 

.53 




.265 


.211 


.276 


.251 


.255 


.187 


.086 


.257 


.223 


.121 


,151 


.232 


.244 


.276 


\ 


.75 


.70 


.82 


.89 


.78 


.83 


.57 


.74 


.92 


.82 


.60 


.82 


.80 


.83 


IT 


.563 


.490 


.672 


.792 


.608 


.689 


.325 


.548 


.846 


.672 


.360 


.672 


.640 


.689 


2,-2 


.47 


.43 


.41 


.32 


.42 


.27 


.26 


.47 


.26 


.18 


.42 


.35 


.38 


.40 


(Max-X), 


.87 


.79 


1.03 


L19 


1.16 


1.05 


-.14 


1.07 


1.46 


.81 


-.21 


.95 


.92 


1.20 


(Max-CQ) 


.24 


.24 


.39 


.55 


.50 


.55 


X 


.45 


.81 


.34 


X 


.40 , 


.32 


.55 




.72 


.59 


.72 


.68 


.62 


.58 


.35 


.65 


.50 


.53 


.46 


.67 


.61 


.73 


HlVl 


.96 


1.1 


.99 


.99 


1.1 


-1.0 


1,1 


1.1 


1.3 


.89 


1.1 


.97 


1.1 


1.0 


icl 


.812 


.801 


.787 


.738 


.828 


.690 


.547 


.720 


.722 


.695 


.736 


.776 


.746 


.816 


H 


525 


473 


1031 


210 


194 


237 


364 


111 


79 


71 


253 


216 


125 


352 



^These items were scored so that the most positive response was '3' rather than 5 for other items. 



Table 2 (continued) 
Descriptive Statistics for Selected ICES Items 
SECURTTY-SPECIFIC 



Item No. 52* 114 . 122^^ 265 340 354 378 381 

Xg-Mdng +.02 -.05 -.34 -.02 -.05 -.10 -.11 -.22 -.02 

^So'Vb -.07 ■ -.09 +.10 .00 -.15 -.22 -.30 -1.00 -.04 

Sg .35 .29 .67 .36 .41 .58 .42 .91 . .36 

flg^ .124 .084 .450 .129 .168 .332 ,173 .828 .127 

9^ .71 .65 .90 .64 .77 .78 .69 .80 .79 
-« 2 

.504 .423 .810 .410 .593 .608 .476 .640 .624 

2-2 

Sj l\ .24 .20 .56 .31 .28 .55 .36 1.3 .20 

(Hax-X)g -.60 -.44 1.35 -.40 .95 .88 .47 1.04 .39 

(Max-CgQ)g X X .60 X .46 .17 .06 .14 .36 

Ss'SsB '^^ '^^ ^'^^ '^^ ^'^5 ,54 

ND/I 1.0 .87 .81 1.1 1.0 .92 1,2 .85 .90 

Rel .678 .643 .845 .814 .685 .796 .723 .920 .560 

H 163 92 49 150 97 98 247 23 47 



*''ERiC[^"'^ ^^^^ scored so that the most positive response was '3' rather than 5 for other items. 



Table 3 

Medians of Selected Descriptive Indices 



Item ^So"So^" ^'^'^^^"^'^tile ' ^ ^ ^ 

Classification X-Mdn (C^q-C^^) Range* Max-X* Max-C^^* s^* s^^ ^B^'w 

Global -.07 -.23 .66 .94 .28 .54 .27 .80 .49 



Influence 

General Concept -.05 -.17 .81 1.26 .52 .70 .35 .80 .49 

Security 

General Concept -.05 -.15 .65 1.04 .44 .65 .25 .80 ' .41 

Influence 

Specific -.OS -.12 .63 1.01 .42 .68 .23 .73 .36 

Security 

Specific -.05 -.09 .55 .95 .26 .62 .17 .72 .31 



♦Items with a most positive response of 3.0 were omitted from these measures. 



1i 



ERIC 



29 



U 



